最近,一个本地平衡(LB)的样本家族在离散空间中的采样和学习能量模型(EBM)方面表现出色。但是,对这一成功的理论理解是有限的。在这项工作中,我们展示了LB功能如何引起与离散空间中Wasserstein梯度流相对应的LB动力学。从第一原则来看,先前的LB采样器就可以看作是LB动力学相对于锤距的离散化。基于此观察结果,我们提出了一种新算法,即局部平衡跳跃(LBJ),通过将LB动力学相对于仿真时间离散。结果,LBJ具有位置依赖性的“速度”,使其可以提出更大距离的建议。此外,LBJ将每个维度分解为独立的子过程,从而实现方便的并行实现。我们证明了LBJ在各种二进制和分类分布中的采样和学习方面的优势。
translated by 谷歌翻译
Monge Map是指两个概率分布之间的最佳运输映射,并提供了将一个分发转换为另一个的原则方法。尽管最佳运输问题的数值方法的快速发展,但计算Monge地图仍然具有挑战性,特别是对于高维问题。在本文中,我们提出了一种可扩展算法,用于计算两个概率分布之间的Monge地图。我们的算法基于最佳运输问题的弱形式,因此它仅需要来自边缘的样本而不是其分析表达式,并且可以容纳两个具有不同尺寸的分布之间的最佳运输。我们的算法适用于一般成本函数,与其他现有方法相比,用于使用样本估计Monge Maps的方法,这些方法通常用于二次成本。通过具有合成和现实数据的一系列实验来证明我们的算法的性能。
translated by 谷歌翻译
自我监督学习(SSL)在预处理模型中取得了出色的性能,这些模型可以通过微调进一步用于下游任务。但是,这些自我监督模型可能不会捕获有意义的语义信息,因为在对比度损失中始终将属于同一类的图像视为负对。因此,同一类的图像通常在学习的特征空间中彼此之间相距很远,这不可避免地会阻碍微调过程。为了解决这个问题,我们试图通过增强语义信息来为自我监督模型提供更好的初始化。为此,我们提出了一种对比初始化(COIN)方法,该方法通过在微调之前引入额外的初始化阶段来打破标准的微调管道。广泛的实验表明,借助丰富的语义,我们的硬币显着优于现有方法,而无需引入额外的培训成本,并在多个下游任务上设定了新的最新技术。
translated by 谷歌翻译
Human group detection, which splits crowd of people into groups, is an important step for video-based human social activity analysis. The core of human group detection is the human social relation representation and division.In this paper, we propose a new two-stage multi-head framework for human group detection. In the first stage, we propose a human behavior simulator head to learn the social relation feature embedding, which is self-supervisely trained by leveraging the socially grounded multi-person behavior relationship. In the second stage, based on the social relation embedding, we develop a self-attention inspired network for human group detection. Remarkable performance on two state-of-the-art large-scale benchmarks, i.e., PANDA and JRDB-Group, verifies the effectiveness of the proposed framework. Benefiting from the self-supervised social relation embedding, our method can provide promising results with very few (labeled) training data. We will release the source code to the public.
translated by 谷歌翻译
To obtain a more comprehensive activity understanding for a crowded scene, in this paper, we propose a new problem of panoramic human activity recognition (PAR), which aims to simultaneous achieve the individual action, social group activity, and global activity recognition. This is a challenging yet practical problem in real-world applications. For this problem, we develop a novel hierarchical graph neural network to progressively represent and model the multi-granularity human activities and mutual social relations for a crowd of people. We further build a benchmark to evaluate the proposed method and other existing related methods. Experimental results verify the rationality of the proposed PAR problem, the effectiveness of our method and the usefulness of the benchmark. We will release the source code and benchmark to the public for promoting the study on this problem.
translated by 谷歌翻译
机器学习透明度(ML),试图揭示复杂模型的工作机制。透明ML承诺推进人为因素在目标用户中以人为本的人体目标的工程目标。从以人为本的设计视角,透明度不是ML模型的属性,而是一种能力,即算法与用户之间的关系;因此,与用户的迭代原型和评估对于获得提供透明度的充足解决方案至关重要。然而,由于有限的可用性和最终用户,遵循了医疗保健和医学图像分析的人以人为本的设计原则是具有挑战性的。为了调查医学图像分析中透明ML的状态,我们对文献进行了系统审查。我们的评论在医学图像分析应用程序的透明ML的设计和验证方面揭示了多种严重的缺点。我们发现,大多数研究到达迄今为止透明度作为模型本身的属性,类似于任务性能,而不考虑既未开发也不考虑最终用户也不考虑评估。此外,缺乏用户研究以及透明度声明的偶发验证将当代研究透明ML的医学图像分析有可能对用户难以理解的风险,因此临床无关紧要。为了缓解即将到来的研究中的这些缺点,同时承认人以人为中心设计在医疗保健中的挑战,我们介绍了用于医学图像分析中的透明ML系统的系统设计指令。 Intrult指南建议形成的用户研究作为透明模型设计的第一步,以了解用户需求和域要求。在此过程之后,会产生支持设计选择的证据,最终增加了算法提供透明度的可能性。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译